Dataset trans #13065

100pah · 2020-07-31T12:51:05Z

Brief Information

This pull request is in the type of:

bug fixing
new feature
others

Support dataset transform for:

Declarable and serializable data process config.
Eanble to integrate thrid-party data process tool as a plugin of echarts.

Details

Enable dataset config transforms to generate new data source.
Third-party transforms can be registered.
A transform might have multiple input and multiple output. But in most cases, only one input and one output.
Transforms can be piped.
Parameters of transforms are declarable/serializable as possible as we can. Also callback might be supported if necessary.

General

For example, suppose we have a data:

var SALES_DATA = [
    ['Product', 'Sales', 'Price', 'Year'],
    ['Cake', 123, 32, 2011],
    ['Cereal', 231, 14, 2011],
    ['Tofu', 235, 5, 2011],
    ['Dumpling', 341, 25, 2011],
    ['Biscuit', 122, 29, 2011],
    ['Cake', 143, 30, 2012],
    ['Cereal', 201, 19, 2012],
    ['Tofu', 255, 7, 2012],
    ['Dumpling', 241, 27, 2012],
    ['Biscuit', 102, 34, 2012],
    ['Cake', 153, 28, 2013],
    ['Cereal', 181, 21, 2013],
    ['Tofu', 395, 4, 2013],
    ['Dumpling', 281, 31, 2013],
    ['Biscuit', 92, 39, 2013],
    ['Cake', 223, 29, 2014],
    ['Cereal', 211, 17, 2014],
    ['Tofu', 345, 3, 2014],
    ['Dumpling', 211, 35, 2014],
    ['Biscuit', 72, 24, 2014],
];

We can make three pies like this:

var option = {
    dataset: [{
        source: SALES_DATA
    }, {
        transform: {
            type: 'filter',
            config: { dimension: 'Year', value: 2011 }
        }
    }, {
        transform: {
            type: 'filter',
            config: { dimension: 'Year', value: 2012 }
        }
    }, {
        transform: {
            type: 'filter',
            config: { dimension: 'Year', value: 2013 }
        }
    }],
    series: [{
        type: 'pie', radius: 50, center: ['25%', '50%'],
        datasetIndex: 1
    }, {
        type: 'pie', radius: 50, center: ['50%', '50%'],
        datasetIndex: 2
    }, {
        type: 'pie', radius: 50, center: ['75%', '50%'],
        datasetIndex: 3
    }],
};

Or:

var option1 = {
    dataset: [{
        source: SALES_DATA
    }, {
        // Pipe the transforms (filter and sort). This is a short-cut.
        transform: [{
            type: 'filter',
            config: { dimension: 'Product', value: 'Tofu' }
        }, {
            type: 'sort',
            config: { dimension: 'Sales', order: 'asc' }
        }]
    }],
    legend: {},
    xAxis: {},
    yAxis: { type: 'category' },
    series: [{
        type: 'bar',
        name: 'Tofu',
        encode: { y: 'Year', x: 'Sales' },
        datasetIndex: 1
    }],
};

Filter transform

transform: {
    type: 'filter',
    config: {}
},

The config is a "conditional expression option", can be:

type ConditionalExpressionOption =
    true | false | RelationalExpressionOption | LogicalExpressionOption;
type LogicalExpressionOption = {
    and?: LogicalExpressionSubOption[];
    or?: LogicalExpressionSubOption[];
    not?: LogicalExpressionSubOption;
};
type LogicalExpressionSubOption =
    LogicalExpressionOption | RelationalExpressionOption | TrueFalseExpressionOption;
type RelationalExpressionOption = {
    dimension?: string | number;
    parse?: 'time' | 'trim';
    lt?: OptionDataValue; // less than
    lte?: OptionDataValue; // less than or equal
    gt?: OptionDataValue; // greater than
    gte?: OptionDataValue; // greater than or equal
    eq?: OptionDataValue; // equal
    ne?: OptionDataValue; // not equal
    '<'?: OptionDataValue; // lt
    '<='?: OptionDataValue; // lte
    '>'?: OptionDataValue; // gt
    '>='?: OptionDataValue; // gte
    '='?: OptionDataValue; // eq
    '!='?: OptionDataValue; // ne
    '<>'?: OptionDataValue; // ne (SQL style)
    reg?: RegExp | string; // RegExp
}

For example:

// Parse time and then use arithmetic operators.
config: {
    dimension: 'Year', '>=': '2016-02', '<': '2016-03', parse: 'time'
}
// Logical expression option
config: {
    and: [{
        dimension: 'Sex', eq: 'male'
    }, {
        or: [{
            // support regexp, like SQL `like "%Smith%"` did.
            dimension: 'Name', reg: /(\s|^)Su(\s|$)/
        }, {
            dimension: 'Name', reg: /(\s|^)Smith(\s|$)/
        }]
    }]
}

Sort transform

transform: {
    type: 'sort',
    config: { dimension: 'Price', order: 'asc' }
}
// or
transform: {
    type: 'sort',
    // multiple sort
    config: [
        { dimension: 'Price', order: 'asc' },
        { dimension: 'Year', order: 'desc', parse: 'time' }
    ]
}

By default, compare the raw value by JS relational operator.
If specify parse: 'time', compare with parsed value.

How to output multiple result

                dataset: [{
                    source: rawData
                }, {
                    transform: {
                        type: 'echarts-extension:boxplot'
                    }
                    // This transform output two results.
                    // The first result is so-called the "main result", 
                    // Can be referenced directly
                }, {
                    fromDatasetIndex: 1,
                    fromTransformResult: 1
                    // Use `fromTransformResult` to retrieve the extra result from 
                    // the pre dataset.
                }],
                series: [{
                    name: 'boxplot',
                    type: 'boxplot',
                    // Reference the dataset 1
                    datasetIndex: 1
                }, {
                    name: 'outlier',
                    type: 'scatter',
                    // Reference the dataset 2
                    datasetIndex: 2
                }]

Register third-party transform

echarts.registerTransform(myTransform);

const myTransform = {
    // Name space is required (my).
    type: 'my:regression',
    transform: function (params) {
        // If using multiple upstream dataset.
        const upstreamSourceList = params.sourceList;
        // The first upstream dataset.
        const upstreamSource = params.source;

        const dimensionInfoAll = upstreamSource.getDimensionInfoAll();
        const dimensionInfo = upstreamSource.getDimensionInfo('Year');

        const dataItem = upstreamSource.getRawDataItem(4);
        const headerItem = upstreamSource.getRawHeaderItem(1);

        const resultData = [
            [...],
            [...],
            ...
        ];

        // ...
        return { data: resultData };
    }
};

Boxplot case currently

var option = {
    dataset: [{
        source: rawData
    }, {
        transform: {
            type: 'boxplot'
        }
    }, {
        fromDatasetIndex: 1,
        fromTransformResult: 1
    }],
    series: [{
        name: 'boxplot',
        type: 'boxplot',
        datasetIndex: 1
    }, {
        name: 'outlier',
        type: 'scatter',
        datasetIndex: 2
    }],
    ...
};

ecStat case currently

echarts.registerTransform(...);

// Regression:
var option = {
    dataset: [{
        source: rawData
    }, {
        transform: {
            type: 'ecStat:regression',
            config: {
                method: 'exponential'
            }
        }
    }, {
        fromDatasetIndex: 1,
        fromTransformResult: 1
    }],
    legend: {
        bottom: 20
    },
    tooltip: {
    },
    xAxis: {
        type: 'category',
    },
    yAxis: {
    },
    series: [{
        name: 'scatter',
        type: 'scatter',
        datasetIndex: 0
    }, {
        name: 'regression',
        type: 'line',
        symbol: 'none',
        datasetIndex: 1
    }]
};

Debug for users

Set print: true to print transform result data in browser console.
This feature only work in dev mode.

dataset: {
    transform: {
        type: 'filter',
        config: { ... },
        print: true
    }
}

New option

type DatasetOption = {
    fromDatasetIndex?: number;
    fromDatasetId?: string;
    transform?: DataTransformOption | PipedDataTransformOption;
    // When a transform result more than on results, the results can be referenced only by:
    // Using `fromDatasetIndex`/`fromDatasetId` and `transfromResultIndex` to retrieve
    // the results from other dataset.
    fromTransformResult?: number;
};
interface DataTransformOption {
    type: DataTransformType;
    config: DataTransformConfig;
    // Print the result via `console.log` when transform performed. Only work in dev mode for debug.
    print?: boolean;
}

TODO

How to handle the numeric-like string and '-' representing no value?
Add more demos.
Upgrade ecStat to encapsulate as transforms for easy usage.
Add built-in transforms: aggregate, map/convert, merge/concat, expand, pivot, tree.
Add callback support for filter transform if needed.
Optimize filter transform if needed.
Integrate with some built-in component like legend.
Progressive-friendly consideration.

Memo

DO NOT expose the concept "data filter processor" to end users unless we really make sure that concept will not be changed any more forever.

Consider these cases that may tempt us to expose the concept of "data filter processor":

Use legend to control data items or series in bar/line/scatter under custom rules.
- Issue: how about the hover state?
Do "count"/"group by" after dataZoom changed the window.
- Issue: dataZoom filter is not always be used (see filterMode) and may be not reliable (consider if we want to left some more points out of the window to make the line go through the edge of the cartesian)

Drawback of expose the concept:

May bring great burden on future refactor (either for functionality or performance). Do we really ensure that the data process stage will never be modified in future?
May be rarely used, if introducing a new concept that not friendly for junior users.

The solution for that scenario above:

Introduce new features within the scope of "legend component". Enhance "legend" to support more flexible control. And if some senior users intend to build their own legend UI outside or use API to implement the legend functionality, provide a "headless legend" for them.

Test cases

Currently:

test/data-transform.html
test/boxplot.html
test/data-transform-ecStat.html

# Conflicts: # src/chart/graph/GraphSeries.ts # src/chart/sankey/SankeySeries.ts # src/data/helper/dimensionHelper.ts # src/model/mixin/dataFormat.ts # src/util/graphic.ts # src/util/states.ts

# Conflicts: # src/echarts.ts

…nge.

echarts-bot · 2020-07-31T12:51:07Z

Thanks for your contribution!
The community will review it ASAP. In the meanwhile, please checkout the coding standard and Wiki about How to make a pull request.

The pull request is marked to be PR: author is committer because you are a committer of this project.

echarts-bot · 2020-07-31T13:25:39Z

Congratulations! Your PR has been merged. Thanks for your contribution! 👍

100pah added 10 commits July 26, 2020 19:32

ts: add types to some @ts-nocheck and remove some 'any'.

9d9ffce

Merge branch 'next' into dataset-trans

53d0200

# Conflicts: # src/chart/graph/GraphSeries.ts # src/chart/sankey/SankeySeries.ts # src/data/helper/dimensionHelper.ts # src/model/mixin/dataFormat.ts # src/util/graphic.ts # src/util/states.ts

feature: data transform

e9a2b0f

fix: fix dimension inherit rule.

f8a59df

Merge branch 'next' into dataset-trans

b6124d5

# Conflicts: # src/echarts.ts

ts: fix type.

c6465c9

ts: fix type.

4243980

feature: add boxplot transform.

9881368

test: add case for ecStat

2ce65aa

fix: prevent potential issue after the implementation of isString cha…

246667e

…nge.

echarts-bot bot added PR: author is committer PR: awaiting review labels Jul 31, 2020

pull-request-size bot added the size/XXL label Jul 31, 2020

feature: move boxplot transform internally.

feae3b4

pissang merged commit f050a8a into next Jul 31, 2020

echarts-bot bot removed the PR: awaiting review label Jul 31, 2020

pissang mentioned this pull request Aug 4, 2020

feat(pie): provide array data. close #11247 #12103

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset trans #13065

Dataset trans #13065

100pah commented Jul 31, 2020 •

edited

Loading

echarts-bot bot commented Jul 31, 2020

echarts-bot bot commented Jul 31, 2020

Dataset trans #13065

Dataset trans #13065

Conversation

100pah commented Jul 31, 2020 • edited Loading

Brief Information

Details

General

Filter transform

Sort transform

How to output multiple result

Register third-party transform

Boxplot case currently

ecStat case currently

Debug for users

New option

TODO

Memo

Test cases

echarts-bot bot commented Jul 31, 2020

echarts-bot bot commented Jul 31, 2020

100pah commented Jul 31, 2020 •

edited

Loading